PAR Renaud Luc RICHARDET
نویسندگان
چکیده
In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles in peer-reviewed journals. One challenge for modern neuroinformatics is to design methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and its integration into computational models. In this thesis, we introduce novel natural language processing (NLP) models and systems to mine the neuroscientific literature. In addition to in vivo, in vitro or in silico experiments, we coin the NLP methods developed in this thesis as in litero experiments, aiming in litero at analyzing and making accessible the extended body of neuroscientific literature. In particular, we focus on two important neuroscientific entities: brain regions and neural cells. An integrated NLP model is designed to automatically extract braiNER: brain region conbraiNER: brain region connectivity nectivity statements from very large corpora. This system is applied to a large corpus of 25M PubMed abstracts and 600K full-text articles. Central to this system is the creation of a searchable database of brain region connectivity statements, allowing neuroscientists to gain an overview of all brain regions connected to a given region of interest. More importantly, the database enables researcher to provide feedback on connectivity results and links back to the original article sentence to provide the relevant context. The database is evaluated by neuroanatomists on real connectomics tasks (targets of Nucleus Accumbens) and results in significant effort reduction in comparison to previous manual methods (from 1 week to 2h). Subsequently, we introduce neuroNER to identify, normalize and compare instances of neurons in the scientific literature. Our method relies on identifying and analyzing each of neuroNER: identify neurons the domain features used to annotate a specific neuron mention, like the morphological term “basket” or brain region “hippocampus”. We apply our method to the same corpus of 25M PubMed abstracts and 600K full-text articles and find over 500K unique neuron type mentions. To demonstrate the utility of our approach, we also apply our method towards cross-comparing the NeuroLex and Human Brain Project (HBP) cell type ontologies. By decoupling a neuron mention’s identity into its specific compositional features, our method can successfully identify specific neuron types even if they are not explicitly listed within a predefined neuron type lexicon, thus greatly facilitating cross-laboratory studies. In order to build such large databases, several tools and infrastructures were developed: a bluima: large-scale NLP robust pipeline to preprocess full-text PDF articles, as well as bluima, an NLP processing pipeline specialized on neuroscience to perform text-mining at PubMed scale.
منابع مشابه
Bluima: a UIMA-based NLP Toolkit for Neuroscience
This paper describes Bluima, a natural language processing (NLP) pipeline focusing on the extraction of neuroscientific content and based on the UIMA framework. Bluima builds upon models from biomedical NLP (BioNLP) like specialized tokenizers and lemmatizers. It adds further models and tools specific to neuroscience (e.g. named entity recognizer for neuron or brain region mentions) and provide...
متن کاملCLINICAL TRIALS AND OBSERVATIONS Maintenance therapy with thalidomide improves survival in patients with multiple myeloma
Michel Attal, Jean-Luc Harousseau, Serge Leyvraz, Chantal Doyen, Cyrille Hulin, Lofti Benboubker, Ibrahim Yakoub Agha, Jean-Henri Bourhis, Laurent Garderet, Brigitte Pegourie, Charles Dumontet, Marc Renaud, Laurent Voillat, Christian Berthou, Gerald Marit, Mathieu Monconduit, Denis Caillot, Bernard Grobois, Herve Avet-Loiseau, Philippe Moreau, and Thierry Facon, for the Inter-Groupe Francophone...
متن کاملEquations aux dérivées partielles, Evolutions de courbes et de surfaces et espaces d'echelle: Applications à la vision par ordinateur
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...
متن کامل